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Abstract.  We  derive  an  implementable  algorithm  for  solving  nonlinear  stochastic  opti¬ 


mization  problems  with  failure  probability  constraints  using  sample  average  approxima¬ 


tions.  The  paper  extends  prior  results  dealing  with  a  failure  probability  expressed  by  a 


single  measure  to  the  case  of  failure  probability  expressed  in  terms  of  multiple  performance 


measures.  We  also  present  a  new  formula  for  the  failure  probability  gradient.  A  numerical 


example  addressing  the  optimal  design  of  a  reinforced  concrete  highway  bridge  illustrates 


the  algorithm. 
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1  Introduction 


This  paper  focuses  on  a  class  of  decision-making  problems  frequently  arising  in  design  and 


maintenance  optimization  of  mechanical  structures  such  as  bridges,  building  frames  and 


aircraft  wings.  Let  x  £  IRn  be  a  vector  of  design  variables,  e.g.  related  to  the  size  and 


form  of  the  structure,  let  c  :  IRn  — >  1R  be  a  deterministic,  continuously  differentiable  cost 


function,  and  let  p  :  IR"  — >  [0, 1]  be  a  failure  probability  (to  be  defined  below).  Then,  the 


optimal  design  problem  is  defined  as  a  chance-constrained  stochastic  program  in  the  form: 


(P)  min  {c(x)  I  p(x)  <  q,  x  £  X}  ,  (1) 


where  q  is  a  bound  on  the  failure  probability,  X  =  {x  £  IRn  |  fj(x)  <  0 ,j  e  J},  and 
fj  :  1R?1  — *  1R,  j  G  J  =  {1,2, ...,  J},  are  deterministic,  continuously  differentiable  functions. 


Mechanical  structures  are  assessed  using  one  or  more  performance  measures,  e.g.,  dis¬ 


placement  and  stress  levels  at  various  locations  in  the  structure.  In  Ref.  1,  we  focused  on 


the  case  were  failure  probability  is  defined  as  the  probability  of  one  performance  measure 


being  unsatisfactory.  In  this  paper,  we  consider  the  general  case  of  “system  failure  proba¬ 


bility”  where  failure  probability  is  defined  by  a  collection  of  performance  measures.  Here, 


“failure”  occurs  when  specific  combinations  of  the  performance  measures  are  unsatisfactory. 
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Let  gk  :  IRn  x  ]Rm  — *  ]R,  k  G  K  =  {1,2,..., IP},  be  a  collection  of  performance  functions 
describing  the  relevant  performance  measures.  The  functions  <%,( •,  •)  depend  on  the  design 
x  G  !Rn  and  a  standard  normal  random  m-vector  u.  This  random  vector  incorporates 


the  uncertainty  in  the  structure  and  its  environment.  Note  that  random  vectors  governed 
by  distributions  such  as  the  multivariate  normal  (possibly  with  correlation)  and  lognormal 
distributions  can  be  transformed  into  a  standard  normal  vector  using  a  smooth  bijective 


mapping.  Hence,  the  limitation  to  a  multivariate  standard  normal  distribution  is  in  many 
applications  not  restrictive  (see  e.g.  Chapter  7  of  Ref.  2  and  Refs.  3-5). 

By  convention,  gk(x,u)  <  0  represents  unsatisfactory  performance  of  the  k- th  measure. 
Formally,  let  the  probability  space  (IRm,77.m,IP)  be  defined  in  terms  of  the  sample  space 
]Rm,  the  Borel  sets  on  ]Rm,  denoted  lZm,  and  the  multivariate  standard  normal  distribution 
IP  of  u.  Assuming  that  gk(x.  •)  is  measurable  for  all  x  G  IR"  and  k  G  K,  we  define  the 
failure  probability  of  the  structure  by  p(x)  =  IP[J?r(x)],  where  the  failure  domain 


F{x)  =  |J  P|  {u  G  1R"1  |  gk{x,u)  <  0},  (2) 

»ei  fceCi 

with  C i  C  K  and  I  =  {1,2,...,/}  defining  the  combinations  of  performance  measures  that 
leads  to  structural  failure.  In  Ref.  1,  we  focused  on  the  case  I\  =  1,  Ci  =  {!},  and  1  =  1. 
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Solution  approaches  for  stochastic  programs  tend  to  be  based  on  either  interior  or  ex¬ 


terior  sampling.  Interior  sampling  approaches  aim  to  solve  stochastic  programs  directly 


and  resort  to  sampling  techniques  whenever  the  algorithm  requires  the  evaluation  of  prob¬ 


ability  functions,  or  more  generally,  expectations.  Usually,  different  samples  are  generated 


each  time  an  evaluation  is  necessary.  In  this  group  of  approaches  we  find  stochastic  quasi¬ 


gradient  methods  (Refs.  6-8).  These  methods  are  difficult  to  apply  to  problems  involving 


failure  probability  constraints.  In  principle,  such  constraints  can  be  removed  by  penalty  or 


barrier  terms  in  the  objective  function.  However,  the  details  of  an  implementable  algorithm 


for  nonlinear  problems  based  on  this  principle  do  not  appear  to  have  been  worked  out. 


Exterior  sampling  techniques  construct  and  solve  a  sample  average  approximation  with¬ 


out  further  sampling  during  the  optimization.  The  results  available  for  exterior  sampling 


techniques  include  the  fact  that  minimizers  and  minimum  values  of  sample  average  ap¬ 


proximations  converge  with  probability  one  to  minimizers  and  the  minimum  value  of  the 


original  problem,  respectively,  as  the  number  of  samples  goes  to  infinity  (see  Chapter  6  of 


Ref.  9  and  references  therein).  For  techniques  for  checking  whether  a  given  point  is  close  to 


stationarity  see  e.g.  Section  6.4  of  Ref.  9.  These  results  provide  guidance  for  the  selection 
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of  approximating  problems  to  be  solved  using  some  deterministic  optimization  algorithm. 


The  authors  of  Ref.  10  present  a  framework  based  on  sample  average  approximations, 


for  proving  convergence  of  nonlinear  programs  involving  probabilities  or  expectations.  Only 


a  rather  weak  sense  of  convergence  is  presented  in  Ref.  10:  every  accumulation  point  of 


the  sequence  of  function  values  generated  by  the  algorithm  is  bounded  from  above  by  the 


largest  function  value  at  any  stationary  point.  In  addition,  the  details  of  an  implementable 


algorithm  for  problems  with  failure  probability  constraints  are  not  provided. 


Recently,  elements  of  interior  and  exterior  sampling  techniques  have  been  combined.  In 


Ref.  11,  sample  average  approximations  are  used  to  derive  a  gradient-type  search  direction 


for  nonlinear  stochastic  programs.  For  each  iteration,  re-sanrpling  is  performed  to  generate 


a  new  search  direction.  Under  the  assumption  of  convex,  twice-differentiable  functions,  it  is 


shown  that  the  expected  distance  to  a  Karush-Kuhn-' Tucker  point  vanishes  with  increasing 


iterations  when  the  search  direction  is  combined  with  sufficiently  small  stepsizes.  However, 


it  is  not  clear  how  to  implement  a  satisfactory  stepsize  rule.  In  addition,  the  literature 


contains  a  large  number  of  approximate  or  heuristic  approaches  for  solving  (P)  and  similar, 


more  specialized  problems.  A  review  of  such  results  is  found  in  Ref.  12. 
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The  failure  probability  is  continuously  differentiable  under  broad  conditions  when  J-(x) 


is  bounded  and  given  by  a  union  of  events  (Ref.  13).  However,  the  derivative  formula  in 
Ref.  13  is  difficult  to  use  in  estimation  because  it  may  involve  surface  integrals.  In  Ref.  14 
(see  also  Ref.  8),  an  integral  transformation  is  presented,  which,  when  it  exists,  leads  to  a 
simple  formula  for  Vp(x).  However,  it  is  not  clear  under  what  conditions  the  transformation 
exists.  As  in  Ref.  13,  Ref.  15  assumes  that  J-(x)  is  bounded  and  given  by  a  union  of  events. 
With  this  restriction,  a  formula  for  Vp(x)  involving  integration  over  a  simplex  is  derived.  In 
principle,  this  integral  can  be  evaluated  by  Monte  Carlo  methods.  However,  to  the  authors’ 


knowledge,  there  is  no  computational  experience  with  estimation  of  failure  probabilities  for 


highly  reliable  mechanical  structures  using  this  formula.  In  Section  9.2  of  Ref.  2,  a  formula 
for  Vp(x)  is  suggested,  without  a  complete  proof,  for  the  case  K  =  1.  This  formula  is  based 
on  an  expression  for  p(x)  that  has  been  found  computationally  efficient  in  applications. 

In  the  next  section,  we  generalize  the  special-case  formula  for  Vp(x)  found  in  Ref.  2  and 
provide  a  proof.  We  also  present  estimators  for  p(x)  and  Vp(x)  and  discuss  their  properties. 


For  completeness,  Section  3  presents  Algorithm  Model  3.3.27  from  Ref.  16,  which  we  use  to 
develop  our  algorithm  for  solving  (P).  Section  4  derives  an  implementation  of  the  algorithm 
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model  by  generalizing  the  results  of  Ref.  1.  Section  5  presents  a  numerical  example. 


2  Failure  Probability 


As  indicated  in  Section  1,  a  significant  difficulty  is  associated  with  deriving  a  tractable 
formula  for  the  gradient  of  p(x)4.  To  overcome  this  difficulty,  we  decompose  the  vector  u 
into  a  direction  w  and  a  positive  length  r  as  described  in  further  detail  below  (see  Ref.  17 
for  the  first  application  of  such  an  decomposition) . 

We  need  the  component  failure  domain  J~k(x)  defined  by  J-k(x)  =  {«£  IRm  |  (jk{x.  u)  < 
0},  k  E  K,  and  the  surface  of  the  unit  hypersphere  denoted  by  IB  =  {w  E  IRm  |  \\w\\  =  1}. 
The  following  assumption  is  sufficient  to  ensure  equivalence  between  p(x)  and  its  alternative 
expression  in  Proposition  2.1  below. 


Assumption  2.1.  We  assume  that 


(i)  the  distribution  IP  of  the  random  m-vector  u  is  standard  normal,  and 


(ii)  for  a  given  S  C  IRn,  Tk(x)c  (=  1R"1  —  J~k{x))  is  star-shaped  with  respect  to  the  origin 

4We  will  not  make  use  of  this  fact,  but  note  that  p(x)  can  be  estimated  using  standard  Monte  Carlo 


techniques  independently  of  the  results  in  this  section. 


for  all  k  £  K  and  x  £  S,  i.e. ,  for  all  x  £  5,  0  £  Ek{x)c  and  for  every  w  £  IB  there 


either  exists  a  unique  r  >  0  such  that  gk(x,  rw)  =  0  or  gk(x,  rw )  /  0  for  all  r  >  0. 


In  view  of  Assumption  2. l(i)  and  the  fact  that  mechanical  structures  have  small  failure 
probabilities,  we  conclude  that  Ek(x)  tends  to  be  located  far  from  the  “high  probability 
region”  close  to  0  £  ]Rm.  Hence,  the  condition  that  0  £  J-k{x)c  is  almost  always  satisfied  for 
mechanical  structures.  The  second  part  of  Assumption  2.1  (ii)  is  satisfied  when  gk{x,  •)  is 
affine  for  all  x  £  S  and  IgK,  which  is  approximately  true  for  many  mechanical  structures 
(Sections  4.1  and  5.2  of  Ref.  2).  However,  in  general,  it  can  be  difficult  to  verify  the  second 
part  of  Assumption  2.1(h)  analytically.  This  is  especially  the  case  when  fjg,(-,-),fc  £  K, 
are  given  by  the  solutions  of  some  (differential)  equations.  Nevertheless,  it  is  possible  to 
obtain  numerical  indications  of  the  validity  of  Assumption  2.1(h)  by  estimating  the  failure 
probability  using  an  alternative  estimator  as  explained  below.  We  also  note  that  equivalent 


assumptions  were  adopted  by  Refs.  15,  18,  19  and  Section  9.2  of  Ref.  2. 


Let  P  and  E  be  the  uniform  distribution  on  IB  and  the  corresponding  expectation, 
respectively.  Furthermore,  we  define  :  !Rn  x  IB  — >  [0,  oo],  k  £  K,  to  be  the  smallest 
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nonnegative  solution  of  gk(x,  rw)  =  0  for  a  given  x  £  1RTI  and  w  £  IB,  i.e. , 

min{r  |  gk(x,rw )  <  0,  r  >  0},  if  {r  \  gk(x,rw )  <  0,r  >  0}  /  0, 

r 

oo,  otherwise. 

Note  that  under  Assumption  2.1(h),  0  £  Tk(x)c  and  hence  rk(x,w)  >  0. 


rk(x,w)  =  < 


(3) 


Proposition  2.1.  If  Assumption  2.1  holds  at  x  £  IRn,  then 


P(x)  =  E[(f>(x,w)], 


(4) 


where 


cj)(x,  w )  =  max  min{l  -  Xm(rk{x> w ))} 


(5) 


and  Xm(')  is  the  Chi-square  cumulative  distribution  function  with  m  degrees  of  freedom. 
Proof:  As  in  Ref.  17  (see  alternatively  Refs.  18,  19,  and  Section  9.2  of  Ref.  2),  we  observe 
that,  if  the  standard  normal  random  vector  u  =  rw  and  r2  is  Chi-square  distributed  with  m 
degrees  of  freedom,  then  w  is  a  random  vector,  independent  of  r,  uniformly  distributed  over 
the  surface  of  the  m-dimensional  unit  hypersphere.  Hence,  using  an  equivalent  minimax 
expression  of  (2)  and  the  total  probability  rule  we  obtain  that 

(6) 


II 

"5 

Pr 

<  min  max  qh[x,  rw)  <  0  > 

w 

(  ie I  feC,  J 
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where  Pr  is  the  Chi-square  distribution.  In  view  of  Assumption  2.1,  the  expression  inside 


the  expectation  in  (6)  equals  1  —  %^(r2(x,  rc)),  where  r(x,w)  =  minjgi  maxfcGCi  ^(x,  w). 
Geometrically,  r(x,w )  is  the  minimum  distance  in  direction  w  from  0  £  IRm  to  J-(x).  Since 
Xm(')  and  the  s<luare  function  (positive  domain)  are  strictly  increasing,  the  result  follows. □ 


Informally,  we  note  that  (j)(x,  w)  is  the  probability  of  a  failure  event  in  the  direction  of 
w  for  a  given  x.  We  also  observe  that  when  Assumption  2.1  (ii)  is  not  satisfied,  (4)  may 
overestimate  the  failure  probability.  Hence,  it  is  conservative  to  assume  that  Assumption 
2.1  (ii)  is  satisfied.  For  a  given  x  £  IRn,  it  is  possible  to  get  an  indication  whether  Assumption 
2.1(h)  holds  by  computing  an  estimate  Ij?(x)(uj)/N  of  p(x),  where  U2,  are 

independent,  identically  distributed  standard  normal  vectors  and  Ip(x){uj)  =  1  if  Uj  £ 
J-(x),  and  zero  otherwise.  If  this  estimate  is  significantly  smaller  than  the  one  of  (4),  then 
Assumption  2.1(ii)  is  not  satisfied. 

For  a  given  x  £  IRn  and  w  £  IB,  let  the  set  of  active  performance  functions 


K(x,  w)  =  {k  £  K| k  £  C i(x,  w),i  £  I(x,  re)}  (7) 
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where 


I(x,  w)  =  <  i  G  I 


min  max  rk(x,  w )  =  maxr^r,  re)  >  , 
i'Gl  fc£C.-/  /cGC^ 


Cj(x,  w)  =  <  k  G  C 


max  rk'(x,  re)  =  rk(x,  w) 
fc'G  Cf 


Furthermore,  let  Ak{x)  be  the  re-directions  where  A:  is  active,  i.e., 


Ak(x)  =  (re  G  IB  |  k  G  K(x,  re)}. 


(8) 

(9) 


(10) 


We  now  show  that  p(-)  is  continuously  differentiable  under  the  following  assumptions. 


Assumption  2.2.  We  assume  that  for  a  given  set  S  C  IRn,  the  following  hold: 


(i)  there  exists  a  constant  C\  <  oo  such  that  minjgi  max^C;  fk(x,  re)  <  C\  for  all  x  G  S 
and  re  G  IB, 


(ii)  the  performance  functions  gk(x,u),k  G  K,  are  continuously  differentiable  in  both 
arguments  for  all  x  G  5,  u  G  Htm, 


(iii)  there  exists  a  constant  C-2  >  0  such  that  \\7ugk{x,rk{x,w)w)Tw\  >  C2  for  all  x  G  S, 
w  G  Ak(x),  and  IgK,  and 


(iv)  P[Ak{x)  f]Mx)\  =  0  for  all  x  G  S  and  k,  l  G  K,  k  A  I- 
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The  particular  set  of  performance  functions  gk(-,-),k  £  K,  arising  in  an  application  may 


not  satisfy  Assumption  2.2(i).  However,  it  is  always  possible  to  define  an  artificial  perfor¬ 


mance  function  gx+i(x,  u)  =  p—  ||u||,  with  a  sufficiently  large  p  >  0,  replace  I  by  I  + 1,  and 


set  C /  =  {K  +  1}.  Then,  J~(x)  satisfies  Assumption  2.2(i).  This  is  equivalent  to  enlarging 


the  failure  domain.  The  probability  associated  with  the  enlarged  failure  domain  is  slightly 


larger  than  the  one  associated  with  the  original  failure  domain.  The  difference,  however, 
is  no  greater  than  1  —  Xm(/°2)  and  therefor  negligible  for  sufficiently  large  p.  Consequently, 


Assumption  2.2(i)  is  not  restrictive  in  practice. 


Assumption  2.2(iii)  can  be  difficult  to  verify.  However,  it  is  our  experience  that  values 


of  performance  functions  arising  in  practice  tend  to  change  as  one  moves  from  .A(x)c  into 


the  interior  of  the  failure  domain  for  a  fixed  x.  If  this  was  not  the  case,  a  perturbation  of 


a  scenario  would  have  resulted  in  no  change  in  the  performance  measure,  which  is  unlikely 


in  mechanical  structures.  Assumption  2.2(iv)  states  that  only  one  performance  function  is 


active  at  each  point  on  the  boundary  of  the  failure  domain  almost  surely.  Since  performance 


functions  represent  different  performance  measures  in  a  structure,  they  tend  to  have  quite 


different  forms.  If  two  performance  functions  are  identical  on  significant  subsets,  then  one 
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of  them  is  redundant  and  remodelling  is  appropriate. 


Lemma  2.1.  Suppose  Assumptions  2.1  and  2.2  hold  on  a  sufficently  large  subset  of  IRn 
containing  a  compact  set  Xq.  Then, 

(i)  there  exists  a  constant  C  <  oo  such  that  \<fi(x,w)  —  4>(x',w)\  <  C\\x  —  a;,||  for  all 
x,  x'  G  Xq  and  w  G  IB, 

(ii)  for  each  fixed  x  G  Xq,  is  continuously  differentiable  at  x  for  P-almost  all 

w  G  IB, 

(iii)  the  collections  {(/}(x,w)}xex0  and  {Xx4>{x,w)}X£X0  are  uniformly  integrable,  i.e. , 

lim  sup  /  \4>(x,  w)\P(dw)  =  0  (11) 

T1-*00  iGA'o  J{ragB  |  \<f>(x,w)\>~t} 

and  similarly  with  \4>(x,w)\  replaced  by  \\Vx(j)(x,  «;)||. 

Proof:  First  consider  (i).  By  Assumption  2.2(i),  rk(x,w )  <  oo  for  all  x  G  Xq,  w  G  IB,  and 
k  G  K(x,u>).  Hence,  in  view  of  Assumptions  2.2(ii-iii)  and  2.1(h),  the  implicit  function 
theorem  gives  that  rk(-,w )  is  continuously  differentiable  for  all  x  G  Xq.  w  G  IB,  and  k  G 
K(  x,  w),  and 

Xxrk{x,  w)  =  -Xxgk(x,  rk(x ,  w)w)/Xugk(x,  rk(x,  w)w)T w  (12) 
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when  defined.  Consequently,  it  can  be  deduced  from  Theorem  5.4.5  in  Ref.  16  that  for  all 


w  £  IB,  (j>(x,w )  has  directional  derivatives  on  x  £  Xo  in  all  directions  with  respect  to  its 
first  argument  and  the  directional  derivative  in  the  direction  h  £  IRn  is  defined  by 


d(/)(x,w,  h)  =  max  min  —2fx2(r\(x,w))rii(x,w)Xxrk{x1w)Th,  (13) 

i£l(x,w)  k£Ci(x,w) 

where  fx 2  (•)  is  the  probability  density  function  of  the  Chi-square  distribution.  Using  As¬ 
sumption  2.2(iii)  and  the  fact  that  rk(x,w),k  £  K.(x,w)  are  bounded  on  Xq  for  all  w  £  IB, 
(12)  is  bounded  for  all  x  £  Xq,  w  £  IB,  and  k  £  K (x,w).  Since  the  nrax-nrin  in  (13)  is 
only  over  i  £  I (x,w)  and  k  £  Ci(x,w),  it  follows  from  the  definition  of  Ji(x,w)  and  the 
boundedness  of  (12)  that  (13),  given  an  h  £  !Rn,  is  bounded  for  all  x  £  Xq  and  w  £  IB. 
Hence,  1 4>(x,  w)  —  4>(x' ,  rc)|  <  C\\x  —  x'\\  for  all  x,  x'  £  Xq  and  re  £  IB. 

Now  consider  (ii).  Let  x'  £  Xq  be  arbitrary.  For  P-almost  all  re  £  IB,  it  follows  from 
Assumption  2.2(iv)  that  the  ray  from  0  £  !Rm  in  the  direction  of  w  does  not  intersect 
Ak(x')  0  ^i(x')  for  any  k,l  £  K,  /  l.  Let  w'  £  IB  be  such  that  the  corresponding 
ray  has  this  property.  Consequently,  the  set  K.(x',w')  has  cardinality  one.  Suppose  that 
k!  =  K ~.(x\w').  Then,  gk'{x',  r(x',  w')w')  =  0,  where  r(x,w )  =  minjei  max^gCi  fk(x,w).  By 
Assumption  2.1(h)  and  continuity,  there  exists  a  neighborhood  X'0  C  Xq  of  x'  such  that 
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k'  =  K(x,  w')  and  gk'(x,r(x,w')w')  =  0  for  all  x  G  Xq.  Hence,  cf>(x,w')  =  1  —  Xm(rk'(x’w ')) 
for  all  x  G  Xq  and,  due  to  the  smoothness  of  rk'(-,w'),  (j>(x,  w')  is  continuously  differentiable 
at  x'  with 


Vx(t>(x',w') 


2fx2(rl(x',w'))rk,{x\w') 


V  Xgk>(x'  ,rki{x'  ,w')w') 
Vugk'(x',  rki(x',  w')w')Tw' ' 


(14) 


Finally,  consider  (iii).  Clearly,  \(j)(x,w)\  <  1  for  all  x  G  Xq  and  w  G  IB  and  (11)  holds. 
By  Assumption  2.2(i-iii)  and  the  fact  that  Xo  is  compact,  it  follows  that  for  all  w  G  IB,  the 
right-hand  side  of  (14)  is  uniformly  bounded  on  Xo-  Hence,  {Xx(j)(x,w)}x^x0  is  uniformly 
integrable.  □ 


In  view  of  Lemma  2.1,  the  next  result  follows  directly  from  Proposition  2.1  in  Ref.  20. 


Proposition  2.2.  If  Assumptions  2.1  and  2.2  hold  on  a  sufficiently  large  subset  of  IRn 
containing  a  convex  and  compact  set  Xo,  then  p(-)  is  continuously  differentiable  on  Xo  and 
its  gradient  is  given  by 

Vp(x)  =  E[Vx(/)(x,w)],  (15) 


where  X7xcj)(x,w)  is  given  in  (14)  with  k'  G  K(x,tc).  □ 


We  estimate  the  expectations  in  (4)  and  (15)  by  Monte  Carlo  sampling.  Consider  an 
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infinite  sequence  of  sample  points,  each  generated  by  independent  sampling  from  P.  Let 


IB  =  IB  x  IB  x  ...  and  let  P  be  the  probability  distribution  on  IB  generated  by  P.  Let 
subelements  of  w  G  IB  be  denoted  Wj  G  IB.j  =  1,2,...,  i.e. ,  w  =  (w\,W2,  ■■■)■  For  every 
w  G  IB,  we  define  the  estimator  of  (4): 

N 

pN{x,  w)  =  J2  Wj)/N.  (16) 

3= 1 

The  asymptotic  property  of  this  estimator  is  given  by  the  next  well-known  result  (see  e.g. 
Ref.  21  for  a  proof  which  holds  under  weaker  assumptions  than  stated  here.). 

Proposition  2.3.  If  Assumptions  2.1  and  2.2  hold  on  a  sufficiently  large  subset  of  IRn 
containing  a  compact  set  Xq.  then,  for  P-almost  all  w  G  IB, 


lim  sup  \pn(x,w)  —  p(x)\  =  0. 

N^oo  xeXo 


(17) 


□ 


Since  4>(-,w)  is  only  Lipschitz  continuous  (Lemma  2.1  (i) ) ,  we  observe  that  pn(-,w)  is 


generally  nonsmooth.  However,  as  the  next  proposition  shows,  px(-,w)  has  directional 

derivatives  and  a  nonempty  subgradient5  for  all  w  G  IB. 

5See  e.g.  Definition  5.1.31  in  Ref.  16.  This  type  of  subgradient  is  sometimes  referred  to  as  a  regular 

subgradient  (Definition  8.3  in  Ref.  22) 
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Proposition  2.4.  Suppose  Assumptions  2.1  and  2.2  hold  on  a  sufficiently  large  subset  of 


IRn  containing  a  compact  set  Xq.  Then,  for  all  w  E  IB,  pn(-,w)  is  Lipschitz  continuous  on 
Xq  and  has  a  nonempty  subgradient  &pn(x,w )  defined  by 

N 

dpN(x,w)  =  Y^d<l)(x,  wj)/N,  (18) 

i=i 

where 


d(j)(x,  Wj ) 


conv 

k£K.(x,Wj) 


V xdk {x,Vk(x,  Wj )  Wj  )  \ 
Vugk(x,  rk(x,Wj)wj)TWj  J  ’ 


(19) 


with  fx 2  being  the  Chi-square  probability  density  function. 

Proof:  It  follows  from  Lemma  2. l(i) ,  that  for  all  w  E  IB,  ppf(-,w)  is  Lipschitz  continuous 
on  Xq.  It  can  be  deduced  from  Theorem  5.4.5  in  Ref.  16  that  for  all  w  E  IB,  cf has 
directional  derivatives  with  respect  to  its  first  argument  in  all  directions.  Hence,  it  follows 
from  Lemma  1  in  Ref.  23  that  for  all  w  E  IB,  pw(x,w)  has  directional  derivatives  in  all 
directions  for  all  x  E  Xq.  Furthermore,  a  slight  generalization  of  Corollary  5.4.6  in  Ref. 
16,  yields  that  the  directional  derivative  of  p^{x,w)  is  identical  to  the  (Clarke)  generalized 
directional  derivative  of  pn(x,w).  Hence,  8pn(x,w )  is  identical  to  the  (Clarke)  generalized 
gradient  of  pn(x,w),  which  is  nonempty.  A  slight  extension  of  Corollary  5.4.6  in  Ref.  16, 
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yields  the  formula  above. 


□ 


By  Lemma  2.1,  the  next  result  follows  directly  from  Proposition  2.2  in  Ref.  20. 

Proposition  2.5.  Suppose  Assumptions  2.1  and  2.2  hold  on  a  sufficiently  large  subset  of 
IRn  containing  a  convex  and  compact  set  Ao-  Then,  for  P-almost  all  w  G  IB,  dpj\r(x,w ) 
converges  uniformly  to  Xp(x),  i.e., 

lim  sup  sup  lid  —  Vp(x)||  =  0.  (20) 

N^ooxeXo  dedPN(x,w) 

□ 

Using  (16),  we  define  a  sequence  of  approximating  problems.  For  any  w  G  IB  and 
N  G  IN  =  {1,2,...},  let  the  sample  average  approximating  problem  (P n(w))  be  defined  by 
(Pat(Ic))  min  {c(x)  |  pn(x,w)  <  q,  x  G  A}  .  (21) 

a;  SIR'1 

Intuitively,  (Pjy(w))  becomes  a  better  “approximation”  to  (P)  as  N  increases.  In  fact,  epi- 
convergence  characterizes  this  effect  more  precisely,  as  we  see  in  the  next  proposition  (see 
e.g.  Theorems  3. 3. 2-3. 3. 3  in  Ref.  16  for  a  proof),  which  requires  a  constraint  qualification: 

Assumption  2.3.  Given  w  G  IB,  we  assume  that  for  every  satisfying  p(x)  <  q,  there 

exists  a  sequence  {xat}“=1  C  X,  with  pn(xn,w)  <  q,  such  that  xn  — *  x,  as  N  — >  oo.  □ 
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Proposition  2.6.  Consider  the  sequence  of  approximate  problems  {Pn(u>)}n=i-  Suppose 
that  Assumptions  2.1  and  2.2  hold  on  a  sufficiently  large  convex  and  compact  subset  of  IRn. 
Then,  for  P-almost  all  w  G  IB,  the  following  holds: 


(i)  If  Assumption  2.3  is  satisfied  at  w  G  IB,  then  {Pjv(t<;)}“=1  epi-converges  to  P. 


(ii)  If  Assumption  2.3  is  satisfied  at  w  G  IB  and  {xn}^=1  is  a  sequence  of  global  minimizers 
of  {Pjv(ui)}jy=i,  then  every  accumulation  point  of  {xat}^=1  is  a  global  minimizer  of 

P.  □ 


Before  presenting  an  algorithm,  we  need  to  strengthen  the  result  in  Proposition  2.3. 


Proposition  2.7.  Suppose  that  Assumptions  2.1  and  2.2  hold  on  a  sufficiently  large  subset 
of  IR"  containing  a  compact  set  Xq.  Then,  for  P-almost  all  Tv  G  IB  there  exists  a  constant 
C  <  oo  such  that  for  all  x  G  Xq  and  N  G  IN, 


\pN(x,w)  -p(x) I  <  CV(log log  N)/N.  (22) 


Proof:  Let  G  be  defined  by  G(x,w )  =  (j)(x,w)  —  p(x)  for  all  x  G  Xq ,w  G  IB.  By  Lemma 
2.1  and  (4),  G  is  centered  and  G(-,w)  G  C(Xq),  where  C(Xf)  is  the  space  of  continuous 
functions  on  Xq.  Furthermore,  by  Assumptions  2.1  and  2.2(ii-iii)  and  the  implicit  function 
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theorem,  ( ■ ,  •)  is  continuous  for  all  x  G  Xo,  w  G  IB,  and  k  G  K(x,ta).  Hence,  it  can  be 
deduced  from  Corollary  5.4.4  in  Ref.  16  that  r(-,  •)  =  minjgi  max^gCi  ?"&(•,  •)  is  continuous  on 
Xo  x  IB.  Since  Xo  and  IB  are  compact,  it  follows  that  r(-,  •)  is  uniformly  continuous  on  Xo  x 
IB.  Let  f  :  IB  — »  C(Xo)  be  defined  by  f{w )  =  r(-,w).  Then,  it  follows  that  f  is  continuous 
on  IB  and  hence  measurable  with  respect  to  the  Borel  sets  in  C(X o).  Consequently,  G  is  also 
measurable.  Hence,  G  is  a  random  variable  with  values  in  a  separable  Banach  space.  Define 
||G||oo  =  suPa;eXo  |G(a;,it>)|.  A  corollary  of  Theorem  8.11  in  Ref.  24,  page  217,  is  that  if  G 
is  a  centered  Banach  space  valued  random  variable  such  that  £,(||G'||^0/loglog  HGIjoc)  <  oo 
and  such  that  the  central  limit  theorem  holds  for  G  then  the  law  of  the  iterated  logarithm 
also  holds  for  G.  Since  G  is  bounded,  £,(||G||^0/loglog  HGjloo)  <  oo.  Hence,  it  only  remains 
to  show  that  the  central  limit  theorem  holds  for  G. 


Let  X(e,Xo,||  •  ||)  be  the  covering  number,  i.e.,  the  minimal  number  of  open  balls 
B°(x,e)  =  {x'  G  !Rn|||a;/  —  x\\  <  e}  needed  to  cover  Xo-  Since  Xo  is  compact,  there 
exists  a  constant  r]  <  oo  such  that  N(e,X o,  ||  •  ||)  <  (rj/e)71  for  all  e  >  0.  Hence,  the  entropy 
integral 


\/logX(e,Xo, 


•  | \)de  <  /  y/ n{ log r]  —  log e)de  <  oo. 


(23) 
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By  Lemma  2.1  (i) ,  G  has  Lipschitz  continuous  sample  paths  with  a  square-integrable  Lips- 
chitz  constant.  Hence,  it  follows  by  the  Jain-Marcus  theorem  (see  Ref.  25,  Example  2.11.13) 
that  the  central  limit  theorem  holds.  Consequently,  the  law  of  iterated  logarithm  holds  for 
G.  Hence,  for  P-almost  all  w  £  IB  and  all  x  6  Xq,  (22)  holds  for  some  C  and  sufficiently 
large  N.  By  increasing  C,  the  result  can  be  made  to  hold  for  all  N.  □ 

3  Algorithm  Model 

Before  presenting  an  algorithm  model,  we  present  optimality  conditions  for  (P).  Under 
Assumptions  2.1  and  2.2,  (P)  is  a  nonlinear  program  involving  continuously  differentiable 
functions,  with  stationary  points  defined  by  the  F.  John  conditions.  We  find  it  convenient 
to  express  the  F.  John  conditions  (see  Theorems  2.2.4  and  2.2.8  in  Ref.  16)  by  means  of  a 
nonpositive,  continuous,  optimality  function  9  :  IR"  — >  1R,  which  is  defined  by 

9{x)  =  —  min{zTb(x)  +  zT B(x)T B(x)z/(25)}  ,  (24) 

z$LZ 

with  Z  =  {z€  1RJ+2  |  J2i=i  z(l)  =  h  z{l)  >  0,  1  =  1 , ...,  J  +  2}, 

b(x)  =  [p/^{x)+,^{x)+  -p{x)  +  q,i>(x)+  -  fi(x),...,il>(x)+  -  fj{x))T,  (25) 
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and  B(x)  =  (Vc(x),  Vp(x),  V/i(x), V/j(x)),  where  i/j(x)  =  max{p(x)  —  q,  maxjej  fj{x)} 


il>(x)+  =  max{0,  ip(x)},  and  7,  <5  >  0.  For  any  x  G  X  such  that  p(x)  <  q.  the  F.  John 
conditions  hold  if  and  only  if  9(x)  =  0  (Theorem  2.2.8  in  Ref.  16). 

Since  neither  p(x)  nor  X7p(x)  can  be  evaluated  exactly  in  finite  computing  time,  an  al¬ 
gorithm  for  (P)  involving  the  evaluations  of  p(x)  and  Vp(x)  is  conceptual.  We  construct  an 
implementable  algorithm  by  using  Algorithm  Model  3.3.27  in  Ref.  16.  For  completeness,  the 


algorithm  model  is  presented  below.  The  algorithm  model  makes  use  of  an  approximate  al¬ 
gorithm  map  An,w  :  IRn  — ►  2^"  involving  p^{x,w)  and  8pn(x,w).  Note  that  the  algorithm 
map  can  be  set-valued.  The  algorithm  model  also  uses  the  function  TV  :  IRn  x  IRn  — ►  1R  in 


a  precision-adjustment  rule,  where 


Fn(x',x")  =  max{c(x")  -  c(x')  -  'yipN(x')+,ipN(x")  -  ^N(x')+}, 


(26) 


^n(x)  =  max  <  pn(x,  w)  —  q,  max  fj(x) 

ieJ 


(27) 


iPn(x)+  =  max{0,  ^n(x)},  and  7  >  0. 

Algorithm  Model  for  Solving  (P).  (Adapted  from  Algorithm  Model  3.3.27,  Ref.  16) 


Parameters.  r  e  (0,1),  7  >  0. 
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Data,  xo  €  IRn,  an  unbounded  set  M  of  positive  integers,  and  a  collection  w  =  (w\,W2,  ■■■)  G 
IB  of  independent  sample  points  from  P. 


Step  0.  Set  i  =  0  and  N  =  min  J\f  . 


Step  1.  Compute  y  G  A^^(xi). 


Step  2.  If 

FN(xi,y)  <  -rj  ^ \J (log log N) / N^j  ,  (28) 


then  set  Xj+i  =  y,  Ni  =  N ,  replace  i  by  i  +  1,  and  go  to  Step  1. 


Else,  replace  N  by  rnin{Ar/  G  AT\N'  >  N},  and  go  to  Step  1.  □ 


Note  that  the  algorithm  model  uses  a  precision-adjustment  rule  (28)  to  ensure  that  the 
error  in  function  evaluations  is  sufficiently  small  in  comparison  to  algorithmic  progress. 
Given  t  and  rj  as  well  as  a  set  of  sample  sizes  J\f,  the  rule  determines  how  fast  the  sample 


size  is  increased.  Empirical  evidence  from  the  areas  of  optimal  control  and  semi-infinite 


optimization  indicates  that  such  a  feedback  rule  is  computationally  more  efficient  than 


using  a  predetermined  schedule  for  increasing  the  sample  size.  To  ensure  convergence  of  the 


algorithm  model,  we  adopt  the  following  assumption  regarding  the  algorithm  map  in  Step 
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1.  For  brevity,  for  any  x  £  IRn  and  p  >  0,  let  B(x,p )  =  {x'  £  IRri|||x  —  x'\\  <  p}. 

Assumption  3.1.  Given  S  C  !Rn,  we  assume  that  the  algorithm  map  :  IRn  — >  2®" 

satisfies  the  following  property  for  P- almost  all  w  £  IB: 

For  every  x  £  S  with  9(x)  <  0,  there  exist  5X  >  0,  Nx  £  IN,  and  px  >  0  such  that 
Fn(x',  y)  <  -5X  for  all  N  >  Nx,  x’  £  B(x,px),  and  y  £  AN^{x'). 

In  the  next  section,  we  present  one  particular  algorithm  map  that  satisfies  Assumption 
3.1.  The  convergence  of  the  algorithm  model  is  given  by  the  next  theorem,  which  can  be 
proven  using  the  same  arguments  as  in  the  proof  of  Theorem  3.3.29  in  Ref.  16. 

Theorem  3.1.  Suppose  that  Assumptions  2.1,  2.2,  and  3.1  hold  on  a  sufficiently  large 
subset  of  !Rn  and  that  the  algorithm  model  for  solving  (P)  has  constructed  a  bounded 
sequence  ^  ^  an  accumulation  point  of  {ic,}“0,  then  x  is  a  stationary  point  for 

(P),  i.e.,  9{x)  =  0,  P-almost  surely.  □ 
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4  Implementation 


Consider  the  following  set-valued  algorithm  map,  which  is  a  generalization  of  the  Polak-He 


algorithm,  see  Section  2.6  in  Ref.  16.  For  any  Xi  G  IR",  N  G  IN,  and  w  G  IB,  we  define 


AN,w(xi)  =  {xi  +  \N(xi,d)hN(xi,d)\d  G  dpN(xi,  w)}, 


(29) 


where  the  Armijo  stepsize  is  given  by 


A  iy(xi,d)=  max 
fce{o,i,2,.. 


|/3fc  |  FN(xi,Xi  +  pkhN(xi,d))  <  pka0N(xi,d )| 


(30) 


with  Fjv(-,  •)  as  in  (26), 


9N(x,d ) 


—  min  ^zTb^{x)  +  ztBn(x,  d)T Bjy(x,  d)z/(25)]  , 


(31) 


where  6  >  0,  Z  is  defined  as  in  (24), 


bN(x )  =  (yip n{x)+,'iPn(x)+  —  pN(x,  w)  +  q,'ipN(x)+  -  f^x),  ...,ipN(x)+  -  fj{x))T,  (32) 

BN(x,d)  =  (Vc(s),d,V/i(s),...JV/j(s)),  (33) 


a  G  (0, 1],  and  (3  G  (0, 1).  Finally,  the  search  direction 


hN(xi,d)  =  -BN(xi,d)z/6, 


(34) 
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where  z  is  any  solution  of  (31).  The  parameter  7  in  (26)  and  (31)  should  be  set  equal  to  7 
in  Fpf(-,  •)  in  the  algorithm  model.  The  problem  in  (31)  is  quadratic  and  can  be  solved  in 
a  finite  number  of  iterations  by  a  standard  QP-solver  (e.g.  Quadprog  Ref.  26). 

Our  next  result  shows  that  (29)  satisfies  Assumption  3.1.  Hence,  this  algorithm  map 
combined  with  the  algorithm  model  in  a  convergent  implementable  algorithm. 


Proposition  4.1.  Suppose  that  Assumptions  2.1  and  2.2  hold  on  an  open  set  S  C  IRn.  For 
any  TV  E  IN  and  w  E  IB,  let  the  algorithm  map  Anw{')  be  defined  by  (29),  with  the  same 
values  of  the  parameters  a,  (3,5,  and  7  for  all  N  E  IN.  Then,  Ajv,®(-)  satisfies  Assumption 
3.1  on  any  convex  and  compact  subset  of  S  for  P- almost  all  w  E  IB. 


Proof:  Let  Xo  C  S  be  convex  and  compact.  For  P- almost  all  w  E  IB,  the  search  direction 
hx{x,  d )  is  bounded  for  all  x  E  Xq  and  d  E  dpx(x,  w)  because  it  is  defined  as  a  linear  combi¬ 
nation  of  bounded  vector-valued  functions  (see  (34),  (33),  Proposition  2.4,  and  Assumption 
2.2).  For  P- almost  all  Tv  E  IB,  the  bound  is  independent  of  N  due  to  Proposition  2.5.  Since 
S  is  open,  there  exists  a  Ai  E  (0, 1]  such  that  x  +  \hj\r(x,d)  E  S  for  P-almost  all  w  E  IB 
and  for  all  x  E  Xq,  A  E  (0,  Ai],  N  E  IN,  and  d  E  dpN{x,w). 

It  is  seen  from  Proposition  2.4  and  its  proof  that  for  all  w  E  IB  and  iES,  pn(x,w )  is 
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Lipschitz  continuous,  with  a  directional  derivative  equal  to  the  (Clarke)  generalized  direc¬ 


tional  derivative.  Hence,  Lebourg  mean-value  theorem  (see  e.g.  Theorem  5.4.13b  in  Ref. 
16)  is  applicable.  By  the  Lebourg  mean- value  theorem  and  the  fact  that  Ai  <  1,  we  obtain 
that  for  P-almost  all  w  G  IB  and  for  all  x  G  Xq,  A  G  (0,  Ai],  d  G  dpi\r(x,w),  and  N  G  IN, 
there  exist  some  s,  Sj  G  [0, 1],  j  =  0, 1,  2, ...,  J,  and  dl  G  dpi\r(x  +  s\hjy(x,  d),w )  such  that 

P/v(x,  x  +  \h]\r(x,  d))  <  Arnax  |  —  7'0_v(x)+  +  Vc(x)Th]\r(x,  d) 

+  (Vc(x  +  soA/?.tv(x,  d))  —  Vc{x))ThN{x,  d), 

Pn(x,  w)  -  q  -  iPn{x)+  +  dTh,N{x,d )  +  {d'  -  d)Th,N{x,d),  (35) 

max{/j(x)  -  ^n(x)+  +  V fj(x)ThN(x,  d) 
is  J 

+  (Vfj(x  +  SjXhN(x,d))  -  Vfj(x))ThN(x,d)} 

For  any  e  >  0,  it  follows  from  Proposition  2.5  that  for  P-almost  all  w  G  IB  there  exists 
aiVjGlN  such  that  for  all  N  >  Ne 


sup  sup  \\d  —  Vp(x)||  <  e/3.  (36) 

x€S  d£dpj\f(x,w) 


Furthermore,  Vc(-),  V/j(-),  j  G  J,  and  Vp(-)  are  uniformly  continuous  on  compact  sets  and 
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h]\r(x,d)  is  bounded  almost  surely.  Hence,  for  P-almost  all  w  G  IB  there  exists  a  Ae  <  Ai 
such  that  for  all  A  G  (0,  Ae],a;  G  Xq,  d  G  3pn(x,w),  and  IV  G  IN  we  have 


\\Xp(x  +  s\h]y(x ,  d))  -  Vp(x)  ||  <  e/3, 


(37) 


||  Vc(x  +  soXh]y(x,  d))  —  Vc(.t)||  <  e, 


(38) 


\\V fj(x  +  Sj\hN(x,d))  -  Vfj{x) ||  <  e,  j  G  J. 


(39) 


Hence,  using  (36)  and  (37),  we  obtain  that  He?'  —  d||  <  e.  From  (35),  we  then  obtain  that 


for  P-almost  all  w  G  IB  and  for  all  x  G  Xq,  A  G  (0,  Ae],  d  G  dpj\r(x ,  re),  and  N  >  Ne 


Fn(x,  x  +  Xh]\r(x,  d))  <  Amax<  —  r)ij)N{x)+  +  Vc(x)t/iat(x,  d) 


pN(x,w)  -  q  -  i/jn{x)+  +  dThN(x,d), 


(40) 


max{/j(x)  -  i>N{x)+  +  V fj(x)1  hN(x,d)}  >  +  Xe\\hN(x,d)\\. 
j  6J 


We  can  deduce  from  Theorem  2.2.8  in  Ref.  16  that 


9n{x ,  d)  =  max  <  —  r)ipN{x)+  +  Vc(x)ThN(x ,  d), 


Pn(x,w)  -  q  -  iPn(x)+  +  dThN(x,d), 


(41) 


ma *{fj(x)  -  n(x)+  +  V  fj{xy  hN(x1d)}  )  +  S\\hN(x,  h)||  /2. 
j  6J 
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Hence,  by  adding  and  subtracting  A<5||/ijvOc,  d)  ||2/2  from  (40),  we  obtain 


Fn(x,x  +  \Iin(x ,  d))  <  X9n(x ,  d)  +  A(e  -  <5||/iat(x,  d)||/2)||/ijv(x,  d)||  (42) 

for  P-almost  all  w  E  IB  and  for  all  x  E  Xq,  A  E  (0,  Ae],  d  E  dpN(x,ui),  and  IV  >  iVe. 

Now  suppose  that  x*  E  Xq  is  such  that  9(x*)  <  0.  Without  loss  of  generality,  we  assume 
that  x*  is  in  the  interior  of  Xq.  Let  h(x)  be  given  by 

h(x)  =  —B{x)Tz/5,  (43) 

where  B{x)  is  defined  as  in  (24),  <5  >  0  is  as  in  (24),  and  z  is  any  solution  of  (24).  Since 
h{x*)  =  0  implies  9(x*)  =  0,  ||/i(aj*)||  ^  0.  Then,  by  continuity  of  O(-)  and  h(-)  (see  Theorem 
2.2.8  in  Ref.  16),  there  exist  <5i  >  0  and  px *  >  0  such  that  Q[x)  <  — <5i  and  ||/t(x)||  >  <5i 
for  all  x  E  B(x*,px*).  Set  e*  =  55i/A.  By  Proposition  7.1  (see  Appendix),  for  P-almost 
all  w  E  IB  there  exists  an  Nx*  >  Ne*  such  that  9^{x,d)  <  —5\/2  and  \\h]y(x,  d)  ||  >  S\/2 
for  all  x  E  B(x*,px*)  and  d  E  dpj\r(x,w).  Since  e*  —  5\\]in(x,  d)||/2)  <  0  for  P-almost 
all  w  E  IB  and  for  all  x  E  B(x*,px*)  and  d  E  8pn(x,w),  it  now  follows  from  (42)  that 
Fn(x,x  +  XhN(x,d))  <  X0N(x,d)  <  0  for  P-almost  all  w  E  IB  and  for  all  x  E  B(x*,px *), 
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A  G  (0,  Ae*],  d  G  dp]\f(x,w),  and  N  >  Nx* .  Hence,  for  algorithm  parameter  a  G  (0, 1], 

Fj\r(x,  x  +  A h^{x,  d))  —  Xad^ix,  d)  <  A(1  —  a)d]y(x,  d)  <  0  (44) 

for  P-almost  all  w  G  IB  and  for  all  x  G  B(x*,px*),  A  G  (0,  Ae*],  d  G  dpjsr(x,w),  and  N  >  Nx*. 
Consequently,  for  any  x  G  B(x*,px*)  and  N  >  Nx* ,  the  algorithm  map  -Ajwuj(-)  has  stepsize 
A at (x,d)  >  /3\e*  for  any  d  G  dpiy(x,w )  and  for  P-almost  all  w  G  IB.  Hence,  for  any 
x  G  B(x*,px*),  d  G  8pn(x,w),  and  N  >  Nx*, 

Fn(x,ij)  <  a\]\ r(x,d)0N(.x,d)  <  a/3Xe*9isr{x,d )  <  —af3Xt*5i/2  (45) 

for  all  y  G  A^iv,(xr)  and  P-almost  all  w  G  IB.  This  completes  the  proof.  □ 

Usually,  the  one-dinrensional  root  finding  problems  in  the  evaluation  of  rk(x,w),  k  G 
K(x,tc),  cannot  be  solved  exactly  in  finite  computing  time.  One  possibility  is  to  introduce 
a  precision  parameter  that  ensures  a  gradually  better  accuracy  in  the  root  finding  as  the 
algorithm  progresses.  Alternatively,  we  can  prescribe  a  rule  saying  that  the  root  finding  al¬ 
gorithm  should  terminate  after  CNt  iterations,  with  C  being  some  constant.  For  simplicity, 
we  have  not  discussed  the  issue  of  root  finding.  In  fact,  this  issue  is  not  problematic  in  prac¬ 
tice.  The  root  finding  problems  can  be  solved  in  a  few  iterations  with  high  accuracy  using 
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standard  algorithms.  Hence,  the  root  finding  problems  are  solved  with  a  fixed  precision  for 


all  iterations  in  the  algorithm  giving  a  negligible  error. 


5  Numerical  Example 


We  consider  the  same  highway  bridge  as  in  Ref.  12.  The  objective  is  to  design  a  reinforced 
concrete  girder  for  minimum  cost  using  nine  design  variables  (n  =  9).  Uncertainty  is 
modeled  using  eight  random  variables  (m  =  8).  We  assume  that  the  girder  can  fail  in  four 
different  modes.  Failure  occurs  if  any  of  the  four  failure  modes  occur.  This  gives  rise  to  four 
performance  functions.  To  ensure  that  J-(x)c  is  bounded  (see  Assumption  2.2(i) ) ,  we  define 
an  artificial  performance  function  g§(x,  u)  =  8  —  ||u||.  Note  that  this  implies  an  enlargement 
of  the  failure  domain,  but  the  increase  in  the  failure  probability  is  less  than  10-10.  This 
leads  to  five  performance  functions  and  C,;  =  {i},i  E  I  =  {1,2,. ..,5}  (see  (2)).  We  impose 
the  constraint  p(x)  <  q  =  0.001350,  as  well  as  23  other  deterministic,  nonlinear  constraints. 
For  details  about  the  example,  we  refer  to  Ref.  12.  It  should  be  noted  that  the  performance 


functions  are  nonlinear  and  sufficiently  differentiable.  We  are  unable  to  verify  analytically 
that  the  performance  functions  satisfy  Assumption  2.1  (ii) .  However,  we  estimate  the  failure 
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probability  for  the  first  and  last  iterations  using  the  estimator  Ylf=i  Ir(x){uj) /N  of  p(x), 
where  m,U2,  ■■■,un  are  independent,  identically  distributed  standard  normal  vectors  and 
Ijr/X){uj)  =  1  if  Uj  G  .F(x),  and  zero  otherwise,  and  find  it  not  significantly  different  from  the 
estimates  obtained  using  (16).  Furthermore,  we  do  not  experience  numerically  difficulties, 
which  could  have  been  expected  in  an  example  not  satisfying  Assumption  2.1  (ii) .  Hence,  it 
is  reasonable  to  believe  that  Assumption  2.1(h)  is  satisfied  over  a  sufficiently  large  subset. 

The  resulting  instance  of  (P)  is  implemented  in  Matlab  6.5  (Ref.  26)  and  solved  using 
our  algorithm  model  with  the  algorithm  map  defined  in  (29).  The  evaluation  of  rk(x,w )  is 
performed  using  the  Matlab  root-finder  Fzero,  with  tolerance  1  •  ICR5,  and  (31)  is  solved 
using  Matlab’s  Quadprog.  Parameters  are  r  =  0.9999,  r/  =  0.002,  7  =  2,  a  =  0.5 ,/3  =  0.8, 
and  <5  =  1.  Furthermore,  M  =  {200,1600,5400,12800,25000,...}.  The  computations  are 
terminated  when  the  algorithm  model  reaches  a  sample  size  greater  than  25000. 

After  an  application  of  our  new  algorithm,  we  obtain  an  optimized  structure  with  cost 
13.288.  In  comparison,  the  design  obtained  in  Ref.  12  has  a  somewhat  larger  cost  of  13.664. 
In  this  example,  a  less  reliable  structure  is  also  cheaper.  As  expected,  when  our  algorithm 
was  terminated  the  constraint  p{x)  <  0.001350  was  (approximately)  active:  estimated 
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failure  probability  is  0.001350  with  coefficient  of  variation  0.02.  An  examination  of  the 


design  from  Ref.  12  shows  that  its  failure  probability  is  0.001310  with  coefficient  of  variation 
0.01.  Hence,  a  95%  confidence  interval  of  the  failure  probability  is  (0.001284,  0.001336) 
which  is  outside  the  constraint  limit  0.001350.  From  this  analysis  we  conclude  that  the 
algorithm  in  Ref.  12  may  give  excessively  safe  designs.  The  algorithm  in  Ref.  12  is  based 
on  heuristic  updating  of  first-order  approximations  of  the  failure  probability  and  is  not 
expected  to  lead  to  the  same  accuracy  level  as  our  new  algorithm.  However,  the  algorithm 
in  Ref.  12  appears  to  involve  fewer  evaluations  of  the  performance  functions  and  their 
gradients.  Note  also  that  the  algorithm  in  Ref.  12  is  limited  to  the  special  case  of  C* 
having  only  one  element  for  all  i  €  I. 

6  Conclusions 

We  construct  an  implementable  algorithm  for  nonlinear  stochastic  programming  problems 
with  system  failure  probability  constraints.  First,  we  generalize  an  expression  for  the  fail¬ 
ure  probability  and  show  that  it  is  continuously  differentiable.  Second,  we  prove  a  uniform 
strong  law  of  large  numbers  for  the  estimators  of  the  failure  probability  and  its  gradient.  We 
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also  establish  a  uniform  law  of  the  iterated  logarithm  for  the  estimator  of  the  failure  proba¬ 


bility.  Third,  we  construct  an  algorithm  map  that  satisfies  the  assumptions  of  an  algorithm 
model.  Preliminary  numerical  testing  on  a  realistic  design  problem  demonstrates  the  po¬ 
tential  for  sampling-based  optimization  algorithms  in  structural  engineering.  In  particular, 
the  high  accuracy  of  such  algorithms  compared  to  frequently  used  heuristics  is  promising. 

7  Appendix 

Proposition  7.1.  Suppose  Assumptions  2.1  and  2.2  hold  on  a  convex  and  compact  set 
Xo  C  IRn.  Then,  for  P-almost  all  w  G  IB, 

lim  sup  sup  \9n(x,  d)  —  0{x)\  =  0  (46) 

xGXo  d£dpN{x,w) 

lim  sup  sup  \\fiN(x,d)  —  h(x)\\  =  0  (47) 

iV— >oo  xGXo  d£dpN(x,w) 

where  and  ^jv(t)  are  defined  in  (24),  (31),  (43),  and  (34),  respectively. 

Proof:  Let  e  >  0  be  arbitrary.  We  deduce  from  Propositions  2.3  and  2.5  that  for  P-almost 
all  w  €  IB  there  exists  a  Ne  e  IN  such  that  for  all  N  >  Ne  and  h  £  IRn 

sup  I if(x)+  -  ipN(x)+\  <  e.  (48) 

xeA'o 
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(49) 


sup  sup  \dTh  —  Vp(x)Th\  <  e\\h\\. 

x£Xq  d£dppf(x,w) 


Consequently,  for  P-almost  all  w  £  IB  and  all  N  >  Ne  and  h  £  HU 


sup  sup  |  pn(x,w)  —  q  —  +  dTh 

xGXq  d£dptf(x,w) 


(50) 


~(p(x)  -  q  -  ip(x)+  +  Vp(x)Th)  |  <  2e  + 


Let 


^at(x,  x  +  h,  d)  =  max<  —7 iJjn(x)+  +  Vc(x)Th,  pn(x,  w)  —  q  —  ip n(x)+  +  dTh, 


(51) 


ma?{fj(x)  ~  ^n{x)+  +  V fj (x)  h}  }  +  5\\h\\  /2, 
jeJ 


?/)(x,  x  +  h)  =  max<  — 7?/>(x)+  +  Vc(x)Th,  p(x)  —  q  —  ip(x)+  +  Vp(x)Th, 


(52) 


ma ?{fj(x)  -  i>(x)+  +  vfj{x)Th}\  +  5\\h\\2/2- 

jeJ  '  J 

Using  (48),  (50),  and  the  fact  that  c(-)  and  Vc(-)  are  bounded  functions  on  Xq,  we  find  that 
for  P- almost  all  w  £  IB  and  all  N  >  Ne  and  h  £  ]Rn, 


sup  sup  \ip]\f(x,  x  +  h,  d)  —  ip(x,  x  +  h)\  <  max{7e,  2e  +  e||/i||}.  (53) 

x£Xq  d£dpN(x,w) 


Next,  h(x)  is  bounded  for  all  x  £  Xq  because  it  is  defined  as  a  linear  combination  of  bounded 


vector-valued  functions.  Using  the  same  argument  and  Proposition  2.5  we  have  that  for 


P-almost  all  re  £  IB,  Jin(x,  d)  is  bounded  for  all  x  £  Xq,  d  £  dpjsr(x,  w),  and  N  £  IN.  Hence, 
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for  P-almost  all  w  £  IB  there  exists  a  C*  <  oo  such  that  ||/i(x)||  <  C*  and  ||/ijv(x,  d)  ||  <  C* 
for  all  x  £  Xq,  d  £  8pn{x,w),  and  IV  £  IN.  From  Theorem  2.2.8  of  Ref.  16  we  deduce  that 


On(x,  d)  =  i/)n(x,  x  +  Iin(x,  d),  d)  =  min  iPn(x,  x  +  h,  d),  (54) 

heR" 


9(x)  =  ib{x,  x  +  h(x))  =  min  %b(x,  x  +  h). 

'  '  hei Rn 


(55) 


Let  e*  =  rnaxjye,  2e  +  eC*}.  We  now  have  that  for  P-almost  all  re  £  IB  and  all  x  £  Xq, 


d  £  8pn(x,w),  and  N  >  Ne*, 


9{x)  <  ^(x,  x  +  1in(x,  d))  <  ^ n(x ,  x  +  1in{x,  d))  +  e*  <  6n(x)  +  e* 


(56) 


9(x)  =  ^(x,  x  +  h(x))  >  4>n(x,  x  +  h(x))  —  e*  >  9n(x)  —  e* 


(57) 


Hence,  (46)  holds.  We  now  address  (47).  For  the  sake  of  a  contradiction,  suppose  (47)  is 


not  valid.  Then,  there  exists  a  subset  IBq  C  IB  with  P[IBo]  >  0  such  that  for  every  u>  £  IBq 


there  exist  e  >  0,  {JVj}^,  Nt  —>  oo,  as  i  — >  oo,  C  X0,  {dj}^,  di  £  dpNi(xi,w), 


with  the  property  that 


||  hNi(xi,di)  -  h(xi)  ||  >  e 


(58) 


for  all  i  £  IN.  As  stated  above,  for  P-almost  all  w  £  IB,  h]y(x,d)  is  bounded  for  all 
x  £  Xq.  d  £  dpisr(x,u ;),  and  A"  £  IN.  Consider  an  w  £  IBq  with  this  boundedness  property. 
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Then,  there  exist  an  infinite  subset  Iq  C  IN,  an  i*  £  IRn  and  an  h*  £  IRn  such  that 


Xi  — *  x*,  h,Ni(xi,  di )  — >  h* ,  as  i  — >  oo,  z  £  /o-  Continuity  of  ?/>(•,  •)  and  (53)  imply  that 

lim  \tpNi{xi,Xi  +  hNi(xi,di),di)  -  -ip(x*,x*  +  h*)\  =  0.  (59) 

z— >oo,iE/o 

Hence,  it  follows  Theorem  3.3.2  in  Ref.  16  that  the  problems 

min  ^N.  (xi,  Xi  +  h,  di)  (60) 

heTRn 

epi-converge  to  the  problem 

min  ii(x* ,x*  +  h),  (61) 

heiR" 

as  i  oo,  i  £  Jo-  Since  {hw^Xi,  dj)}“1  is  a  sequence  of  global  minimizers  of  (60),  it  follows 
from  Theorem  3.3.3  in  Ref.  16  that  h*  must  be  a  global  minimizer  of  (61).  Since  the 
problem  in  (61)  is  strictly  convex,  it  has  a  unique  global  minimizer.  Hence,  h*  =  h(x*), 
which  contradicts  (58).  This  completes  the  proof.  □ 
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